AITopics | bandit learning

Collaborating Authors

bandit learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LocallyDifferentiallyPrivate (Contextual)Bandits Learning

Neural Information Processing SystemsFeb-9-2026, 08:34:53 GMT

Further, we extend our(ε,δ)-LDP algorithm toGeneralized Linear Bandits,which enjoysa sub-linear regret O(T3/4/ε) and is conjectured to be nearly optimal. Note that given the existingΩ(T) lower bound for DP contextual linear bandits [35], our result shows afundamental difference between LDP and DP contextual bandits learning.

bandit, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Germany > Berlin (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

Locally Differentially Private (Contextual) Bandits Learning

Neural Information Processing SystemsDec-24-2025, 07:09:15 GMT

We study locally differentially private (LDP) bandits learning in this paper. First, we propose simple black-box reduction frameworks that can solve a large family of context-free bandits learning problems with LDP guarantee. Based on our frameworks, we can improve previous best results for private bandits learning with one-point feedback, such as private Bandits Convex Optimization etc, and obtain the first results for Bandits Convex Optimization (BCO) with multi-point feedback under LDP. LDP guarantee and black-box nature make our frameworks more attractive in real applications compared with previous specifically designed and relatively weaker differentially private (DP) algorithms. Further, we also extend our algorithm to Generalized Linear Bandits with regret bound $\tilde{\mc{O}}(T^{3/4}/\varepsilon)$ under $(\varepsilon, \delta)$-LDP and it is conjectured to be optimal. Note given existing $\Omega(T)$ lower bound for DP contextual linear bandits (Shariff & Sheffet, NeurIPS 2018), our result shows a fundamental difference between LDP and DP for contextual bandits.

bandit, contextual, name change, (7 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback

Bandit Learning with Implicit Feedback

Neural Information Processing SystemsNov-20-2025, 23:03:52 GMT

Implicit feedback, such as user clicks, although abundant in online information service systems, does not provide substantial evidence on users' evaluation of system's output. Without proper modeling, such incomplete supervision inevitably misleads model estimation, especially in a bandit learning setting where the feedback is acquired on the fly. In this work, we perform contextual bandit learning with implicit feedback by modeling the feedback as a composition of user result examination and relevance judgment. Since users' examination behavior is unobserved, we introduce latent variables to model it. We perform Thompson sampling on top of variational Bayesian inference for arm selection and model update. Our upper regret bound analysis of the proposed algorithm proves its feasibility of learning from implicit feedback in a bandit setting; and extensive empirical evaluations on click logs collected from a major MOOC platform further demonstrate its learning effectiveness in practice.

bandit learning, implicit feedback, name change, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.79)

Add feedback

Bandit Learning in Concave N-Person Games

Neural Information Processing SystemsNov-20-2025, 22:08:13 GMT

This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agents' most sensible choice in this setting would be to employ a no-regret learning algorithm. In general, this does not mean that the players' behavior stabilizes in the long run: no-regret learning may lead to cycles, even with perfect gradient information. However, if a standard monotonicity condition is satisfied, our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability 1. We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization.

bandit learning, concave n-person game, name change, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.79)
Information Technology > Game Theory (0.61)

Add feedback

Locally Differentially Private (Contextual) Bandits Learning

Neural Information Processing SystemsAug-15-2025, 02:52:48 GMT

We study locally differentially private (LDP) bandits learning in this paper.

algorithm, bandit, contextual bandit, (12 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Berlin (0.04)
(3 more...)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

Improved Analysis for Bandit Learning in Matching Markets

Neural Information Processing SystemsMay-27-2025, 11:47:06 GMT

A rich line of works study the bandit learning problem in two-sided matching markets, where one side of market participants (players) are uncertain about their preferences and hope to find a stable matching during iterative matchings with the other side (arms). The state-of-the-art analysis shows that the player-optimal stable regret is of order O(K\log T/\Delta 2) where K is the number of arms, T is the horizon and \Delta is the players' minimum preference gap. However, this result may be far from the lower bound \Omega(\max\{N\log T/\Delta 2, K\log T/\Delta\}) since the number K of arms (workers, publisher slots) may be much larger than that N of players (employers in labor markets, advertisers in online advertising, respectively). In this paper, we propose a new algorithm and show that the regret can be upper bounded by O(N 2\log T/\Delta 2 K \log T/\Delta) . This result removes the dependence on K in the main order term and improves the state-of-the-art guarantee in common cases where N is much smaller than K . Such an advantage is also verified in experiments.

bandit learning, improved analysis, matching market, (3 more...)

Neural Information Processing Systems

Industry: Banking & Finance (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Locally Differentially Private (Contextual) Bandits Learning

Neural Information Processing SystemsOct-10-2024, 18:43:38 GMT

We study locally differentially private (LDP) bandits learning in this paper. First, we propose simple black-box reduction frameworks that can solve a large family of context-free bandits learning problems with LDP guarantee. Based on our frameworks, we can improve previous best results for private bandits learning with one-point feedback, such as private Bandits Convex Optimization etc, and obtain the first results for Bandits Convex Optimization (BCO) with multi-point feedback under LDP. LDP guarantee and black-box nature make our frameworks more attractive in real applications compared with previous specifically designed and relatively weaker differentially private (DP) algorithms. Further, we also extend our algorithm to Generalized Linear Bandits with regret bound \tilde{\mc{O}}(T {3/4}/\varepsilon) under (\varepsilon, \delta) -LDP and it is conjectured to be optimal.

bandit, bandit learning, contextual, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.84)

Add feedback

Reviews: Bandit Learning with Positive Externalities

Neural Information Processing SystemsOct-8-2024, 08:38:24 GMT

The paper studies the interesting problem of learning with externalities, in a multi-armed bandit (MAB) setting. The main idea is that there might be a bias in the preferences in the users arriving on on-line platforms. Specifically, future user arrivals on the on-line platforms are likely to have similar preferences to users who have previously accessed the same platform and were satisfied with the service. Since some on-line platforms use MAB algorithms for optimizing their service, the authors propose the Balanced Exploration (BE) MAB algorithm, which has a structured exploration strategy that takes into account this potential "future user preference bias" (referred to as "positive externalities"). The bias in the preference of the users is translated directly into reward values specific to users arriving to on-line platform: out of the m possible items/arms, each user has a preference for a subset of them (the reward for this being a Bernoulli reward with mean proportional to the popularity of the arm) and the rewards of all other arms will always be null.

algorithm, artificial intelligence, positive externality, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.59)

Add feedback

Reviews: Bandit Learning with Implicit Feedback

Neural Information Processing SystemsOct-8-2024, 05:58:29 GMT

Summary: This work considers learning user preferences using a bandit model. The reward is not only based on the judgement of the user, but also whether the user examined the arm. That is feedback examination * judgement In particular, if a user does not examine an arm, lack of feedback does not necessarily indicate that the user does not "like" the arm. This work uses a latent model for the (unobserved) examination of arms, and posits that the probability of positive feedback (binary) can be expressed as a product of the probability of examination (logistic) and positive feedback (logistic). The work proposes a VI approach to estimating the parameters, and then use a Thompson Sampling approach from the approximate posterior as policy. This allows them to use machinery from Russo and Van Roy to obtain regret bounds.

bandit learning, implicit feedback, synthetic data, (14 more...)

Neural Information Processing Systems

Industry: Education (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.39)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.36)

Add feedback

Reviews: Bandit Learning in Concave N-Person Games

Neural Information Processing SystemsOct-7-2024, 09:24:13 GMT

Context: It is a classic result that empirical frequencies of actions for players playing regret minimization algorithms converges to a coarse corr equilibrium. CCEs are not necessarily desirable solution concepts because they sometimes admit irrational behavior. For monotone games, it is known that the empirical frequencies converge converge to nash equilibrium for agents playing FTRL. Recently, Mertikopoulos et al proved that the sequence of plays for FTRL converges to nash for games -- they prove something more general that goes beyond concave potential games, in fact. This work considers that case when each agent can only observe bandit feedback.

bandit learning, concave n-person game, converge, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.57)

Add feedback